33 research outputs found

    From Data Quality to Big Data Quality

    Get PDF
    This article investigates the evolution of data quality issues from traditional structured data managed in relational databases to Big Data. In particular, the paper examines the nature of the relationship between Data Quality and several research coordinates that are relevant in Big Data, such as the variety of data types, data sources and application domains, focusing on maps, semi-structured texts, linked open data, sensor & sensor networks and official statistics. Consequently a set of structural characteristics is identified and a systematization of the a posteriori correlation between them and quality dimensions is provided. Finally, Big Data quality issues are considered in a conceptual framework suitable to map the evolution of the quality paradigm according to three core coordinates that are significant in the context of the Big Data phenomenon: the data type considered, the source of data, and the application domain. Thus, the framework allows ascertaining the relevant changes in data quality emerging with the Big Data phenomenon, through an integrative and theoretical literature review

    Methodology for Assessment of Linked Data Quality

    No full text
    With the expansion in the amount of data being produced as Linked Data (LD), the opportunity to build use cases has also increased. However, a crippling problem to the relia-bility of these use cases is the underlying poor data quality. Moreover, the ability to assess the quality of the consumed LD, based on the satisfaction of the consumers ’ quality re-quirements, significantly influences usability of such data for a given use case. In this paper, we propose a data quality assessment methodology specifically designed for LD. This methodology consists of three phases and six steps with spe-cific emphasis on considering a use case

    A Multi-perspective Model of Smart Products for Designing Web-Based Services on the Production Chain

    No full text
    In this paper, we propose a multi-perspective model for smart products as a basis to implement advanced web-based services in the smart factory production chain. The model relies on three perspectives that are: the product, the production process and the work centers involved in the production chain. For each perspective, the physical world is connected with the cyber world, where collected sensor data is properly organised to enable data analysis and exploration according to the three perspectives in an interleaved way. A portfolio of web-based services at production chain level is also described according to the model. Among them, a data access dashboard has been designed in order to enable controlled exploration of production chain data for different actors involved in the production activities, ranging from the suppliers to the producer of the final product. The approach is devoted to the production of costly and complex products, where the conceptualisation of a Smart Product as the integration of product, process and infrastructure sensor data aims at ensuring high product quality levels, long lasting operativity, less frequent and efficient maintenance activities and constant performances over time

    Proceedings of the 2nd Workshop on Linked Data Quality (LDQ)

    No full text

    Towards green linked data

    No full text
    Abstract. We here present a vision of what needs to be addressed when designing and publishing linked data on the Web. Our approach aims at reducing the amount of incorrect, irrelevant, or redundant content – which can also be seen as pollution in the Web of Data – when publishing linked data. At the foundation lie the design principles adapted from green engineering.We envision a holistic framework that evaluates, along these principles and their respective assessment metrics, datasets from publishers and allows configuration of new validation tools.

    Capturing the Age of Linked Open Data: Towards a Dataset-Independent Framework

    No full text
    An increasing amount of data are published and consumed on the Web according to the Linked Data paradigm. In such scenario, understanding if the data consumed are up-to-date is crucial. Outdated data are usually considered inappropriate for many crucial tasks, such as make the consumer confident that answers returned to a query are still valid at the time the query is formulated. In this paper we present a first dataset-independent framework for assessing currency of Linked Open Data (LOD) graphs. Starting from the analysis of the 8,713,282 triples containing temporal metadata in the billion triple challenge 2011, we investigate which vocabularies are used to represent versioning metadata, we defined Onto Currency, an ontology that integrates the most frequent properties used in this domain, and supports the collection of metadata from datasets that use different vocabularies. The proposed framework uses this ontology to assess the currency of an RDF graph/statement, by extrapolating it from the currency of the documents that describe the resources occurring in the graphs (statement). The approach has been implemented and evaluated in two different scenarios. © 2012 IEEE
    corecore